[Universal parser] Introduce rb_parser_ary_t #10004

hasumikin · 2024-02-17T01:35:25Z

This PR is part of the work to make Lrama-generating-parse.c a universal parser

Introduce rb_parser_ary_t struct to partly eliminate RArray from parse.y
parser_params->tokens and parser_params->ast->node_buffer->tokens becomes rb_parser_ary_t *
Instead, ast_node_all_tokens() internally creates a Ruby Array object from the rb_parser_ary_t *tokens

nobu · 2024-02-17T02:51:09Z

node.c

@@ -483,7 +511,7 @@ rb_ast_delete_mark_object(rb_ast_t *ast, VALUE obj)
 VALUE
 rb_ast_tokens(rb_ast_t *ast)
 {
-    return ast->node_buffer->tokens;
+    return (VALUE)ast->node_buffer->tokens;


Is rb_parser_ary_t compatible with VALUE?
This function is used in the implementation of RubyVM::AbstractSyntaxTree::Node#all_tokens, and the return value will be seen in Ruby level.

I reimplemented rb_ast_tokens() that constructs RArray from rb_parser_ary_t 8a986a2

nobu · 2024-02-17T02:53:04Z

parse.y

    p->token_id++;

    if (p->debug) {
-        rb_parser_printf(p, "Append tokens (line: %d) %"PRIsVALUE"\n", line, ary);
+        rb_parser_printf(p, "Append tokens (line: %d) %"PRIsVALUE"\n", line, (VALUE)token);


PRsVALUE calls to_s method on the corresponding argument.
rb_parser_ast_token_t does not seem to compatible with VALUE.

Thank you for the review!
I implemented rb_parser_str_escape() which does not depend on VALUE
15e42bc

@nobu Could you please review them again?

yui-knk · 2024-02-25T12:56:14Z

parse.y

+            ary->data = xrealloc(ary->data, sizeof(void *) * len);
+        }
+        for (i = ary->len; i < len; i++) {
+            ary->data[i] = Qnil;


Not Qnil but 0 (or NULL)?

Fixed in hasumikin@9c64383

yui-knk · 2024-02-25T12:56:53Z

parse.y

+        ary->data = NULL;
+    }
+    for (i = 0; i < len; i++) {
+        ary->data[i] = Qnil;


Not Qnil but 0 (or NULL)?

Fixed in hasumikin@9c64383

yui-knk · 2024-02-25T13:00:48Z

parse.y

+rb_parser_ary_push(rb_parser_t *p, rb_parser_ary_t *ary, VALUE val)
+{
+    if (ary->len == ary->capa) {
+        rb_parser_ary_extend(p, ary, ary->len + 1);


Expanding the array with ary->len + 1 is no efficient especially for ast token array because it's expected to be expanded again and again. How about to double the capacity?

Fixed: hasumikin@d1fd5ba
BTW, what does the comment // TODO in rb_ruby_parser_keep_tokens() mean?

It might to be "Remove dependency for Ruby Array", then please remove the // TODO comment in this PR, thank you.

Remove the line hasumikin@968c063

yui-knk · 2024-02-25T13:05:34Z

node.c

+parser_ast_token_free(rb_ast_t *ast, rb_parser_ast_token_t *token)
+{
+    if (!token) return;
+    parser_string_free(ast, (rb_parser_string_t *)token->str);


Is this cast for rb_parser_string_t * needed?

Fixed in hasumikin@9c64383

yui-knk · 2024-02-25T13:11:27Z

node.c

@@ -483,7 +523,32 @@ rb_ast_delete_mark_object(rb_ast_t *ast, VALUE obj)
 VALUE
 rb_ast_tokens(rb_ast_t *ast)
 {
-    return ast->node_buffer->tokens;
+    long i;


How about to move this ruby object construction logic to ast.c? By the change, node.c no need to handle VALUE.

Yeah, I thought that too but,
If we move the logic from node.c to ast.c, do you also think we should redesign ast->node_buffer->tokens into ast->tokens because we can say the tokens member no longer belongs to a Node?
(Of course, we cannot change the Ruby level design: AbstractSyntaxTree::Node though)

In a long term, it's correct however It's impossible right now. rb_ast_t is imemo (internal ruby object) now therefore it's size should be 5 words. It's already uses 5 words like below. We can refactor it once rb_ast_t is not managed as imemo.

typedef struct rb_ast_body_struct { const NODE *root; VALUE script_lines; // script_lines is either: // - a Fixnum that represents the line count of the original source, or // - an Array that contains the lines of the original source signed int frozen_string_literal:2; /* -1: not specified, 0: false, 1: true */ signed int coverage_enabled:2; /* -1: not specified, 0: false, 1: true */ } rb_ast_body_t; typedef struct rb_ast_struct { VALUE flags; node_buffer_t *node_buffer; rb_ast_body_t body; } rb_ast_t;

I did
hasumikin@89667cd

yui-knk · 2024-02-25T13:15:32Z

parse.y

-            rb_obj_freeze(tokens);
-            rb_ast_set_tokens(p->ast, tokens);
+            p->ast->node_buffer->tokens = tokens;
+            p->tokens = NULL;


Could you add free for rb_ruby_parser_free just in case, if an exception happens in the middle of parsing, this line can be skipped.

Is this OK? hasumikin@c2d3fc8 (I amend a function name parser_ary_free() to parser_token_free() for clarity)

parser_token_free is preferred, BTW the existence of ast and ast->node_buffer is confirmed by rb_ast_free. Therefore it's more simple to call parser_tokens_free directly in rb_node_buffer_free.

You are right. Fixed hasumikin@e9e3ac6

yui-knk · 2024-02-25T13:17:03Z

parse.y

    p->token_id++;

    if (p->debug) {
-        rb_parser_printf(p, "Append tokens (line: %d) %"PRIsVALUE"\n", line, ary);
+        rb_parser_printf(p, "Append tokens (line: %d) [%d, :%s, \"%s\", [%d, %d, %d, %d]]\n",
+                            line, token->id, token->type_name, rb_parser_str_escape(p, token->str)->ptr,


I guess rb_parser_string_t allocated in rb_parser_str_escape is not freed.

Thanks. Fixed in hasumikin@edd7c6d

parse.y

nobu · 2024-03-04T04:42:47Z

parse.y

+    for (i = 0; i < len; i++) {
+        ary->data[i] = 0;
+    }


xcalloc zero-fills as well as the standard calloc.

Thank you. Fixed

nobu · 2024-03-04T04:51:44Z

rubyparser.h

+ * Array-like object for parser
+ */
+typedef struct rb_parser_ary {
+    uintptr_t *data;


Why this data is a pointer to uintptr_t instead of rb_parser_ast_token_t?
Also val of rb_parser_ary_push.

We plan to have rb_parser_ary store elements of arbitrary types. However, after talking to Kaneko-san, we decided to limit it to the rb_parser_ast_token_t* type for now.
85ff09c
Thank you for the review!

- Introduce `rb_parser_ary_t` structure to partly eliminate RArray from parse.y - In this patch, `parser_params->tokens` and `parser_params->ast->node_buffer->tokens` are now `rb_parser_ary_t *` - Instead, `ast_node_all_tokens()` internally creates a Ruby Array object from the `rb_parser_ary_t` - Also, delete `rb_ast_tokens()` and `rb_ast_set_tokens()` in node.c - Implement `rb_parser_str_escape()` - This is a port of the `rb_str_escape()` function in string.c - `rb_parser_str_escape()` does not depend on `VALUE` (RString) - Instead, it uses `rb_parser_stirng_t *` - This function works when --dump=y option passed - Because WIP of the universal parser, similar functions like `rb_parser_tokens_free()` exist in both node.c and parse.y. Refactoring them may be needed in some way in the future - Although we considered redesigning the structure: `ast->node_buffer->tokens` into `ast->tokens`, we leave it as it is because `rb_ast_t` is an imemo. (We will address it in the future)

yui-knk · 2024-03-12T08:18:01Z

Thank you!

hasumikin changed the title ~~[]Introduce rb_parser_ary_t~~ [Universal parser] Introduce rb_parser_ary_t Feb 17, 2024

hasumikin force-pushed the rb_parser_ary branch from bc35630 to 5c1b674 Compare February 17, 2024 02:13

nobu requested changes Feb 17, 2024

View reviewed changes

hasumikin force-pushed the rb_parser_ary branch 2 times, most recently from 4cad33b to 8a986a2 Compare February 24, 2024 07:05

yui-knk reviewed Feb 25, 2024

View reviewed changes

hasumikin force-pushed the rb_parser_ary branch 2 times, most recently from 10e95cd to e9e3ac6 Compare February 29, 2024 08:41

hasumikin marked this pull request as ready for review February 29, 2024 09:43

nobu reviewed Mar 4, 2024

View reviewed changes

nobu approved these changes Mar 5, 2024

View reviewed changes

hasumikin force-pushed the rb_parser_ary branch 2 times, most recently from b5fffa1 to f12fb8d Compare March 6, 2024 01:51

hasumikin force-pushed the rb_parser_ary branch from f12fb8d to 4e9d5db Compare March 6, 2024 04:17

yui-knk approved these changes Mar 10, 2024

View reviewed changes

yui-knk merged commit 9a19cfd into ruby:master Mar 12, 2024
97 checks passed

hasumikin deleted the rb_parser_ary branch March 12, 2024 08:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Universal parser] Introduce rb_parser_ary_t #10004

[Universal parser] Introduce rb_parser_ary_t #10004

hasumikin commented Feb 17, 2024 •

edited

nobu Feb 17, 2024

hasumikin Feb 24, 2024

nobu Feb 17, 2024

hasumikin Feb 22, 2024

hasumikin Feb 29, 2024

yui-knk Feb 25, 2024 •

edited

hasumikin Feb 26, 2024

yui-knk Feb 25, 2024

hasumikin Feb 26, 2024

yui-knk Feb 25, 2024

hasumikin Feb 26, 2024 •

edited

yui-knk Feb 28, 2024

hasumikin Feb 29, 2024 •

edited

yui-knk Feb 25, 2024

hasumikin Feb 26, 2024

yui-knk Feb 25, 2024

hasumikin Feb 26, 2024

yui-knk Feb 28, 2024

hasumikin Feb 28, 2024

hasumikin Feb 29, 2024 •

edited

yui-knk Feb 25, 2024

hasumikin Feb 26, 2024

yui-knk Feb 28, 2024

hasumikin Feb 29, 2024

yui-knk Feb 25, 2024

hasumikin Feb 26, 2024

nobu Mar 4, 2024

hasumikin Mar 4, 2024

nobu Mar 4, 2024

hasumikin Mar 4, 2024

yui-knk commented Mar 12, 2024

[Universal parser] Introduce rb_parser_ary_t #10004

[Universal parser] Introduce rb_parser_ary_t #10004

Conversation

hasumikin commented Feb 17, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yui-knk Feb 25, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hasumikin Feb 26, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hasumikin Feb 29, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hasumikin Feb 29, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yui-knk commented Mar 12, 2024

hasumikin commented Feb 17, 2024 •

edited

yui-knk Feb 25, 2024 •

edited

hasumikin Feb 26, 2024 •

edited

hasumikin Feb 29, 2024 •

edited

hasumikin Feb 29, 2024 •

edited